Fitchburg
Precise Information Control in Long-Form Text Generation
He, Jacqueline, Yen, Howard, Li, Margaret, Li, Shuyue Stella, Zeng, Zhiyuan, Shi, Weijia, Tsvetkov, Yulia, Chen, Danqi, Koh, Pang Wei, Zettlemoyer, Luke
A central challenge in language models (LMs) is faithfulness hallucination: the generation of information unsubstantiated by input context. To study this problem, we propose Precise Information Control (PIC), a new task formulation that requires models to generate long-form outputs grounded in a provided set of short self-contained statements, without adding any unsupported ones. PIC includes a full setting that tests a model's ability to include exactly all input claims, and a partial setting that requires the model to selectively incorporate only relevant claims. We present PIC-Bench, a benchmark of eight long-form generation tasks (e.g., summarization, biography generation) adapted to the PIC setting, where LMs are supplied with well-formed, verifiable input claims. Our evaluation of a range of open and proprietary LMs on PIC-Bench reveals that, surprisingly, state-of-the-art LMs still hallucinate against user-provided input in over 70% of generations. To alleviate this lack of faithfulness, we introduce a post-training framework that uses a weakly supervised preference data construction method to train an 8B PIC-LM with stronger PIC ability--improving from 69.1% to 91.0% F1 in the full PIC setting. When integrated into end-to-end factual generation pipelines, PIC-LM improves exact match recall by 17.1% on ambiguous QA with retrieval, and factual precision by 30.5% on a birthplace fact-checking task, underscoring the potential of precisely grounded generation.
- Africa > Democratic Republic of the Congo (0.28)
- North America > Panama (0.14)
- North America > United States > Washington > King County > Seattle (0.14)
- (70 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Personal > Obituary (0.67)
- Law (1.00)
- Health & Medicine (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- (4 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)
CoheMark: A Novel Sentence-Level Watermark for Enhanced Text Quality
Zhang, Junyan, Liu, Shuliang, Liu, Aiwei, Gao, Yubo, Li, Jungang, Gu, Xiaojie, Hu, Xuming
Watermarking technology is a method used to trace the usage of content generated by large language models. However, many existing sentence-level watermarking techniques depend on arbitrary segmentation or generation processes to embed watermarks, which can limit the availability of appropriate sentences. This limitation, in turn, compromises the quality of the generated response. To address the challenge of balancing high text quality with robust watermark detection, we propose CoheMark, an advanced sentence-level watermarking technique that exploits the cohesive relationships between sentences for better logical fluency. The core methodology of CoheMark involves selecting sentences through trained fuzzy c-means clustering and applying specific next sentence selection criteria. Experimental evaluations demonstrate that CoheMark achieves strong watermark strength while exerting minimal impact on text quality. In recent years, the rapid advancement of large language models (LLMs) has revolutionized natural language processing (OpenAI, 2023; Y ang et al., 2024; Touvron et al., 2023). This technological leap, while marking a significant milestone in artificial intelligence, has also brought about unprecedented challenges (Xu et al., 2024; Chen et al., 2023a; Mazeika et al., 2024). A major concern is that large language models can be exploited to generate false information and automated spam (Mirsky et al., 2023). To address this growing concern, researchers have begun focusing on developing various technologies to monitor AI-generated text and its usage. One effective way to track the usage of generated text is through watermarking, which involves embedding imperceptible information into the text (Kirchenbauer et al., 2023a; Kuditipudi et al., 2023; Zhao et al., 2023; Giboulot & Furon, 2024). This makes it easier to detect and track the text for potential misuse. Compared to token-level watermarking methods, sentence-level watermarking is advantageous for preserving the internal semantic fluency within individual sentences and provides greater robustness.
- North America > United States > West Virginia > Raleigh County > Beckley (0.04)
- North America > United States > Washington (0.04)
- North America > United States > Texas > Colorado County (0.04)
- (13 more...)
SteLLA: A Structured Grading System Using LLMs with RAG
Qiu, Hefei, White, Brian, Ding, Ashley, Costa, Reinaldo, Hachem, Ali, Ding, Wei, Chen, Ping
Large Language Models (LLMs) have shown strong general capabilities in many applications. However, how to make them reliable tools for some specific tasks such as automated short answer grading (ASAG) remains a challenge. We present SteLLA (Structured Grading System Using LLMs with RAG) in which a) Retrieval Augmented Generation (RAG) approach is used to empower LLMs specifically on the ASAG task by extracting structured information from the highly relevant and reliable external knowledge based on the instructor-provided reference answer and rubric, b) an LLM performs a structured and question-answering-based evaluation of student answers to provide analytical grades and feedback. A real-world dataset that contains students' answers in an exam was collected from a college-level Biology course. Experiments show that our proposed system can achieve substantial agreement with the human grader while providing break-down grades and feedback on all the knowledge points examined in the problem. A qualitative and error analysis of the feedback generated by GPT4 shows that GPT4 is good at capturing facts while may be prone to inferring too much implication from the given text in the grading task which provides insights into the usage of LLMs in the ASAG system.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.14)
- (9 more...)
Convergence of Nearest Neighbor Pattern Classification with Selective Sampling
Joseph, Shaun N., Bakr, Seif Omar Abu, Lugo, Gabriel
In the panoply of pattern classification techniques, few enjoy the intuitive appeal and simplicity of the nearest neighbor rule: given a set of samples in some metric domain space whose value under some function is known, we estimate the function anywhere in the domain by giving the value of the nearest sample per the metric. More generally, one may use the modal value of the m nearest samples, where m is a fixed positive integer (although m=1 is known to be admissible in the sense that no larger value is asymptotically superior in terms of prediction error). The nearest neighbor rule is nonparametric and extremely general, requiring in principle only that the domain be a metric space. The classic paper on the technique, proving convergence under independent, identically-distributed (iid) sampling, is due to Cover and Hart (1967). Because taking samples is costly, there has been much research in recent years on selective sampling, in which each sample is selected from a pool of candidates ranked by a heuristic; the heuristic tries to guess which candidate would be the most "informative" sample. Lindenbaum et al. (2004) apply selective sampling to the nearest neighbor rule, but their approach sacrifices the austere generality of Cover and Hart; furthermore, their heuristic algorithm is complex and computationally expensive. Here we report recent results that enable selective sampling in the original Cover-Hart setting. Our results pose three selection heuristics and prove that their nearest neighbor rule predictions converge to the true pattern. Two of the algorithms are computationally cheap, with complexity growing linearly in the number of samples. We believe that these results constitute an important advance in the art.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Massachusetts > Worcester County > Fitchburg (0.04)